On this page

Skip to content

A Brief Discussion on C#'s GetHashCode()

Last week, while a colleague was sharing information about Bloom Filters, it reminded me of key comparisons in Dictionary, as both rely on hash values to quickly determine if an element exists. Of course, while both use an array structure and hash values as indices, their specific implementations differ.

GetHashCode()

GetHashCode() is a method used to generate an integer hash value, which is primarily used to compare whether object values are equal. Implementing GetHashCode() must adhere to the following principles:

  • If two objects are considered equal (Equals() returns true), GetHashCode() must return the same value.
  • If two objects are not equal, GetHashCode() does not necessarily need to return different values.
  • GetHashCode() should not throw exceptions.

The GetHashCode() method is commonly used in the following scenarios:

  • IDictionary<TKey, TValue> and Dictionary implementation classes: Such as Dictionary<TKey, TValue> and HashTable, these classes use GetHashCode() for fast key comparison.
  • ISet<T> implementation classes: Such as HashSet<T>, which uses GetHashCode() to determine element uniqueness.
  • LINQ's Distinct() method: Uses GetHashCode() to remove duplicate elements.

If you override Equals() but do not override GetHashCode(), these classes may still consider objects unequal even if they are equal according to Equals(), leading to incorrect behavior. In fact, when you only override Equals(), Visual Studio will display the following warning.

gethashcode overwrite warning

Test using the following code:

csharp
Test test1 = new() {
    Name = "王大明",
    Age = 10
};

Test test2 = new() {
    Name = "王大明",
    Age = 10
};

Dictionary<Test, string> dic = new() {
    [test1] = "測試"
};

Console.WriteLine(test1.Equals(test2));
Console.WriteLine(dic.ContainsKey(test2));


public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;
}

The results are as follows:

text
True
False

Implementation of GetHashCode()

The simplest way to implement it is to use Visual Studio's refactoring feature. Take the following code as an example:

csharp
public class Test {
    public string Name { get; set; }

    public int Age { get; set; }
}

Click on the Test class, press ALT + Enter to open the refactoring options, and select "Generate Equals and GetHashCode...".

vs generate equals menu

Select the members to be used for Equals() evaluation. You can also choose to implement IEquatable<T> and operators (== and !=) simultaneously; we will leave these extra items unchecked for now:

vs generate equals dialog

The generated code is as follows:

csharp
public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;
    public override int GetHashCode() => HashCode.Combine(Name, Age);
}

However, since HashCode.Combine() was added in .NET Core 2.1, it is not supported in .NET Framework. For .NET Framework versions, the generated code might look like this:

csharp
 public class Test {
    public string Name { get; set; }

    public int Age { get; set; }

    public override bool Equals(object obj) => obj is Test test && Name == test.Name && Age == test.Age;

    public override int GetHashCode() {
        int hashCode = -1360180430;
        hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Name);
        hashCode = hashCode * -1521134295 + Age.GetHashCode();
        return hashCode;
    }
}

Code explanation:

  • -1360180430 and -1521134295 are constants used to initialize and combine hash codes to produce a more uniformly distributed hash value. These constants are usually large prime numbers, as prime numbers help reduce the probability of collisions in hash functions.
    • -1360180430: This is the initial hash code value. By using a non-zero starting value (usually a prime number), it ensures that the object's hash code differs from the default value and that it does not return zero as a hash code even if the object has no properties.
    • -1521134295: This is the multiplier used to combine hash values. Using a prime multiplier to mix the hash values of individual fields increases the randomness of the result, thereby reducing the probability of hash collisions.
  • The reason for using EqualityComparer<T>.Default to calculate hash values for reference objects:
    • To avoid null values; when the value is null, it returns 0.
    • To ensure that the same hash calculation rules are used across objects of different classes.

Of course, this .NET Framework version is not easy to memorize. For .NET Framework versions that do not support HashCode.Combine(), some developers might use anonymous objects to obtain GetHashCode(). Naturally, this approach might slightly impact performance due to the creation of temporary anonymous objects.

csharp
public override int GetHashCode() => new { Name, Age }.GetHashCode();

Implementation of ContainsKey(TKey key)

I will briefly explain this here. If you want to know the complete implementation, the .NET 8 source code is excerpted below for your own study.

The ContainsKey() method internally uses the FindValue() method to search for the existence of the specified TKey. If the corresponding value is found, it means the TKey exists in the Dictionary<TKey, TValue>. Dictionary has two main array fields:

  • _entries: Type Entry[], where Entry contains TKey, TValue, and the index of the next entry with the same bucket value.
  • _buckets: Type int[], which stores the mapping table between each bucket value (the value calculated from the TKey's HashCode after modulo operations) and the corresponding entry index.

The FindValue() implementation process is as follows:

  1. Calculate the HashCode of the passed key parameter. This HashCode is used to quickly locate the corresponding entry index in the _buckets array. Through this index, the corresponding entry can be retrieved directly from the _entries array, avoiding sequential searching and improving search efficiency.
  2. First, compare the TKey's HashCode with the HashCode stored in the entry. This allows for quick filtering of non-matching items, reducing the number of calls to the Equals() method. In most cases, the computational cost of Equals() is higher than that of Hash comparison, so this further improves performance.
  3. Since different objects may have the same HashCode, and the index in _buckets is the result of a calculation, the entry index obtained from _buckets might not necessarily point to the desired item. If a mismatch is found, entry.next is used to point to the index of the next item with the same bucket, continuing the comparison until a match is found or all possible items have been checked.

The following is the excerpted implementation of FindValue() in .NET 8:

csharp
internal ref TValue FindValue(TKey key) {
    if (key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    ref Entry entry = ref Unsafe.NullRef<Entry>();
    if (_buckets != null) {
        Debug.Assert(_entries != null, "expected entries to be != null");
        IEqualityComparer<TKey>? comparer = _comparer;
        if (typeof(TKey).IsValueType && // comparer can only be null for value types; enable JIT to eliminate entire if block for ref types
            comparer == null) {
            uint hashCode = (uint)key.GetHashCode();
            int i = GetBucket(hashCode);
            Entry[]? entries = _entries;
            uint collisionCount = 0;

            // ValueType: Devirtualize with EqualityComparer<TKey>.Default intrinsic
            i--; // Value in _buckets is 1-based; subtract 1 from i. We do it here so it fuses with the following conditional.
            do {
                // Should be a while loop https://github.com/dotnet/runtime/issues/9422
                // Test in if to drop range check for following array access
                if ((uint)i >= (uint)entries.Length) {
                    goto ReturnNotFound;
                }

                entry = ref entries[i];
                if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key)) {
                    goto ReturnFound;
                }

                i = entry.next;

                collisionCount++;
            } while (collisionCount <= (uint)entries.Length);

            // The chain of entries forms a loop; which means a concurrent update has happened.
            // Break out of the loop and throw, rather than looping forever.
            goto ConcurrentOperation;
        } else {
            Debug.Assert(comparer is not null);
            uint hashCode = (uint)comparer.GetHashCode(key);
            int i = GetBucket(hashCode);
            Entry[]? entries = _entries;
            uint collisionCount = 0;
            i--; // Value in _buckets is 1-based; subtract 1 from i. We do it here so it fuses with the following conditional.
            do {
                // Should be a while loop https://github.com/dotnet/runtime/issues/9422
                // Test in if to drop range check for following array access
                if ((uint)i >= (uint)entries.Length) {
                    goto ReturnNotFound;
                }

                entry = ref entries[i];
                if (entry.hashCode == hashCode && comparer.Equals(entry.key, key)) {
                    goto ReturnFound;
                }

                i = entry.next;

                collisionCount++;
            } while (collisionCount <= (uint)entries.Length);

            // The chain of entries forms a loop; which means a concurrent update has happened.
            // Break out of the loop and throw, rather than looping forever.
            goto ConcurrentOperation;
        }
    }

    goto ReturnNotFound;

ConcurrentOperation:
    ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
ReturnFound:
    ref TValue value = ref entry.value;
Return:
    return ref value;
ReturnNotFound:
    value = ref Unsafe.NullRef<TValue>();
    goto Return;
}

Change Log

  • 2024-09-01 Initial document creation.